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Advances in the use of autoregressive models , pattern recognition methods, and 
hidden Markov models for on-line health monitoring of dynamic systems (such as 
DSN antennas) have recently been reported . However, the algorithms described in 
previous work have the significant drawback that data acquired under fault condi- 
tions are assumed to be available in order to train the model used for monitoring 
the system under observation. This article reports that this assumption can be 
relaxed and that hidden Markov monitoring models can be constructed using only 
data acquired under normal conditions and prior knowledge of the system char- 
acteristics being measured. The method is described and evaluated on data from 
the DSS 13 34-m beam waveguide antenna. The primary conclusion from the ex- 
perimental results is that the method is indeed practical and holds considerable 
promise for application at the 70-m antenna sites where acquisition of fault data 
under controlled conditions is not realistic. 


I. Introduction and Background 

In previous articles, the problem of on-line health moni- 
toring of a dynamic system (in particular, a DSN 
34-m beam waveguide [BWG] antenna) has been inves- 
tigated [1-3]. The problem can be stated in the follow- 
ing simple manner: let the observed data be denoted by 
X(i) = {x(t),tf(t -r), . . . , £.(0)} where each of the x(t) is a 
^-dimensional vector measurement of sensor data sampled 
at discrete time intervals r. Given 2£(t), the problem is 
to determine the most likely current state of the system 
at time t , where the system is assumed to be in one of 
m states {u;i , . . . ,w m }. The states are unobservable di- 
rectly, but can be inferred from the observable data X(t). 
In probabilistic terms, the modelling goal is to accurately 
model p[u>i(t)\2L(t)] (either from prior knowledge, train- 
ing data, or a combination of both), while for prediction , 
p[a>j(<)j2£0O] is used to predict the current state given a 


specific set of data X (t) for which the system state is un- 
known. Typically corresponds to the normal operating 
state of the system, while the other states represent various 
system faults that may occur. The quality of a particu- 
lar model for p[vi(t)\2L(t)] can be obtained by measuring 
an empirical estimate of the prediction accuracy , which is 
simply the percentage of time that the state predicted by 
the model agrees with the true state — the test is performed 
over a period of time where the system cycles through var- 
ious states (not known to the model) using data that are 
independent of those on which the model was trained. 

In [1] and [2], an autoregressive-exogenous (ARX) time 
series model coupled to a pattern recognition component 
was used as the basis for estimating p[u>,-|*(<)]. This is a 
relatively simple model providing state estimates based 
only on instantaneous measurements x(<) but ignoring 
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past data. This model resulted in prediction accuracies of 
about 90 percent on independent test data sets obtained 
at DSS 13. A significant improvement on this method 
was reported in [3] whereby the past data were used in 
the state estimates by embedding the problem in a hid- 
den Markov model (HMM) framework. The key point of 
the HMM method is that prior knowledge regarding the 
temporal behavior of the states can be used to effectively 
model temporal correlations in the system at the state 
level. On-line tests of this method at DSS 13 in Novem- 
ber 1991 resulted in no prediction errors during a 1-hr test 
with state estimates being provided by the model every 
6 sec [3], 

It should be pointed out that the autoregressive and 
hidden Markov modelling methods are not the only ap- 
proach for the fault detection problem. In [4] a number of 
statistical change detection methods were investigated. It 
was found that change detection methods require signifi- 
cant prior knowledge of the behavior of parameter char- 
acteristics when the system enters a fault state. In prac- 
tice this type of detailed prior knowledge is unlikely to 
be available, limiting the applicability of these methods in 
practice. 

II. Limitations of Previously Reported 
Methods 

While the models described in [1-3] display useful ca- 
pabilities in terms of on-line fault detection, they suffer 
from two major limitations: 

(1) The models assume that the known states are ex- 
haustive, i.e., the set of states {cji, . . . ,u; m } covers 
all possible states in which the system may be. 

(2) The models also require that labelled training data 
are available for each state, i.e., for each state w* 
there is a set of data {x(£),£(< — r), . . . ,£.(())} which 
was measured when the system was known to be in 
state w,*. 

Clearly both of these requirements cannot be satisfied in 
most real-world fault detection applications. For fault de- 
tection, the assumption that all system states due to faults 
can be specified in advance is clearly inappropriate except 
for the simplest of systems — real-world systems (such as 
DSN antenna pointing systems) often contain large num- 
bers of interacting components with feedback and non- 
linearities, making prior prediction of all possible system 
behaviors under fault conditions unrealistic. However, it 
should be pointed out that it is usually possible to model 
system behavior under a small set of likely system faults — 
this point will be expanded upon later in this article. 


The second requirement, that training data are avail- 
able for each possible system state, is coupled to the first 
assumption: if all possible states cannot be described in 
advance, then the notion of having training data for such 
states is moot. However, even if the first assumption were 
satisfied and all fault states could be described in advance, 
the requirement that data can be recorded when the sys- 
tem is in each of these states is often unrealistic. A good 
example is a DSN 70-m antenna where hardware simula- 
tion of fault conditions is not a practical option due to 
operational considerations (as compared to the DSN 34-m 
antenna at DSS 13). 

Hence, there is considerable practical motivation to de- 
velop methods that relax the assumptions on which the 
earlier-reported models are based, while still retaining the 
accurate prediction capabilities of these models. This arti- 
cle describes a relatively simple yet effective method that 
can detect the presence of states for which no training data 
sets were available, i.e., states about which the model has 
no knowledge. It is assumed that training data (or else a 
strong prior model) for at least one state is available — this 
is not restrictive since data under normal conditions are 
almost always available. The proposed method is based on 
the use of prior knowledge to constrain the possible distri- 
bution of system parameters, which when coupled with the 
model derived from the training data, allows detection of 
both known states and a generic, unknown state category. 

This article outlines the general model, illustrates its 
use and effectiveness on data collected from the elevation 
axis of the DSS 13 BWG antenna pointing servomech- 
anism, and describes the limitations of the current ap- 
proach. 

III. Notation and Assumptions 

For the purposes of this article, the distinction is made 
between the observable data at time t which is x(t) and 
the estimated parameters at time t , denoted by the vector 
0(t). Typically x(f) is the original sensor data or time 
series (such as the motor current in an antenna pointing 
system), whereas the values of 0(f) are typically statistical 
estimates of some characteristics of the time series such as 
the mean, variance, or autoregressive (AR) coefficients. In 
this article, attention will be limited to block estimation 
methods whereby 0(t) = f\x_(t),x(t - r), . . . , x(t - Nr)], 
etc. Hence, each of the parameter estimates is derived from 
disjoint windows or blocks of the original data, where N 
is chosen to be large enough to enable reasonably reliable 
statistical estimates. 

Let = {9(t) } 0(t — Nr ), . . . ,0(0)}. In effect, is 
then viewed as the observable data sequence and the prob- 
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lem can be treated as that of recovering the likely system 
states given the estimates <£ t , i.e., find p[u> ( - (/))$*]. Issues 
such as choosing appropriate estimators, block sizes, etc., 
will not be dealt with in this article. For the experimen- 
tal results reported later in this article, values of r = 20 
msec and N = 200 are used. However, for the purposes of 
simplification of notation, it will be assumed without loss 
of generality that Nr = 1 during the development of the 
probabilistic models that follow. 

It is also assumed that there are m — 1 states for which 
prior information is available either in the form of: (1) spe- 
cific parametric models for the dependence of the states 
on the observable data, or (2) training data. An addi- 
tional mth state is used in the model as a single state 
which accounts for all other possible behaviors of the sys- 
tem that are qualitatively different from the known states. 
This state will be referred to as the unknown fault state. 
Hence, in the simplest case, for example, if prior informa- 
tion is only available for the normal state, then the model 
has two states: normal and the unknown fault state. 


IV. The General Model 

The goal of the modelling process is to provide a means 
of estimating the posterior state probabilities 


= pM0I£(0>£(< -!)>••• .£(0)] 


1 < i < m 


(i) 


which are required for prediction. In the Appendix it is 
shown how the hidden Markov framework can be used such 
that the full number of conditioning terms in Eq. (1) is not 
necessary if the appropriate assumptions are met. The 
hidden Markov model leads to recursive estimates of the 
form 


p[ w «(0l£(0>£(* -!)>•• • ,£(o)l « p[£(0M0] 


x £ / (p[uj 

i=l v 


(<-l)|£(i -!),...,£(())] (2) 


so that knowledge of the likelihood p[£($) |w» (<)] at each 
time t (in addition to the Markov transition matrix A) is 
sufficient to calculate the posterior estimates. 

Note that it will be assumed that the statistics of inter- 
est are time-invariant, hence reference to a specific time t 
can be dropped at this point. 


In previous work, direct forward models of p(u>,-|£) were 
estimated and then p(£|u>i) was estimated by the use of 
Bayes’ rule in Eq. (2) [3]. In this article, it is proposed 
to use models of the form p(£|wi) as the direct basis for 
the model. The rationale behind this approach is simple: 
based on prior knowledge alone it is impossible except in 
simple cases to specify the form of p(o^[£). However, it is 
much more likely that one can model the dependence of 
the data on the state, i.e., a prior density can be assigned 
to the likelihood p(£|u?,) based on prior knowledge. In 
particular, for state w m , which is the state that covers 
all possible states not included in the set {uq, . . . , w m _i}, 
one can typically specify a noninformat ive uniform prior 
density over the set of possible parameter values for £. In 
addition, one must also supply models for p(£|w t ), 1 < i < 
m — 1, which are typically estimated from training data. 

The key difference between this method and those 
methods proposed in previous literature is that the model 
works with likelihoods (the probability of the observable 
data given the states), rather than directly with the pos- 
terior state probabilities (the probabilities of the states 
given the data). This approach rules out the use of many 
discriminant-based methods that only provide estimates of 
the posterior probabilities, but do not provide estimates 
of the likelihoods (for example, logistic regression, feed- 
forward neural networks, decision trees, etc.). Methods 
that provide the required estimates include (naturally) 
both parametric methods (such as maximum likelihood 
classifiers based on a specific parametric form for p(£|u?i) 
and nonparametric methods such as kernel density esti- 
mators. For a more extensive general discussion of the 
differences between such models, see [5-7]. 

The proposed method can be summarized as follows: 

(1) Specify or estimate prior density models, p(£|u> t ), 

for the known classes, wj, . . . As mentioned 

above, this requires the use of either a parametric 
model (such as a multivariate Gaussian) or a non- 
parametric density estimation method. 

(2) Specify a prior density for p(£|w m ) where w m is the 
special unknown state. This is typically done by es- 
tablishing bounds or constraints on each of the pa- 
rameters in £ and then (in the absence of any other 
information) specifying a uniform density over the 
bounded parameter space. 

(3) The remainder of the method is the same as before: 
simply estimate the hidden Markov model param- 
eters from reliability data (as described in the Ap- 
pendix) and run the model for prediction. 

Note in step (2) it is important that the derived param- 
eters can be bounded in some manner. The stronger the 
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constraints that can be placed on the parameters, the bet- 
ter will be the detection performance of the model. These 
constraints could be due to the basic physics of the system, 
such as energy limits, or a function of the particular rep- 
resentation being used, such as spectral or autoregressive 
estimates (a specific example is provided in Section V). If 
there are no constraints at all, then it is still possible to 
specify a prior model, such as a Gaussian model, although 
the choice of model may now be somewhat more problem- 
atic since it will inevitably reflect a prior bias which may 
not be appropriate. A better alternative (in the case of no 
constraints) would be not to use a prior model at all for 
and just detect data which appear to be outliers from 
the other m — 1 models. However, outlier detection can 
be problematic — it is a central theme of this article that 
prior constraints can usually be placed on the parameter 
space of interest and that this provides the natural avenue 
for detecting data from w m . In essence, it is argued that 
if such prior constraints exist, this information should be 
used in the model, and should in principle provide better 
detection capabilities than any outlier detection method. 

V. Applying the Likelihood Method 
in Practice 

One significant difference between modelling the like- 
lihoods and posterior probabilities is the issue of dimen- 
sionality, namely, that a high-dimensional parameter space 
will be potentially more problematic for the likelihood 
modelling method than for the posterior (or discriminant) 
modelling method. In a discriminant model (which cal- 
culates posterior probabilities), input dimensions can be 
ignored in the model if they are irrelevant to the state, al- 
lowing more efficient estimation at small training sample 
sizes. However, in the likelihood model, all input dimen- 
sions must be included in the model. If there are a signif- 
icant number of irrelevant or redundant inputs, this can 
lead to a poor model, particularly as the ratio of sample 
size to input dimensions gets small. Hence, parsimony in 
parameter choice is recommended. 

From previous work with the DSS-13 BWG-antenna 
pointing system, it has been found that autoregressive co- 
efficients and standard deviation estimates are both par- 
ticularly useful characteristics of the motor current for the 
purpose of detecting abnormal events [2,3]. In this arti- 
cle, three such characteristics as estimated from the mo- 
tor current signal will be chosen: the two coefficients of 
a second-order autoregressive model [AR(2)], <pi and <£ 2 ? 
and the standard deviation, cr . Hence, 6 = In 

[2] and [3], an eighth-order AHX model was used to model 
the motor current signal, using the rate command as the 
forcing term. However, in the interests of keeping the in- 
put dimensionality relatively low, a simpler AR(2) model 


was used for the purposes of this experiment. While the 
simpler model is not appropriate for complete system iden- 
tification, it is sufficient for the purposes at hand to extract 
useful signal characteristics that can be used to discrimi- 
nate between normal and abnormal operating conditions. 

The next step is to specify a prior density over the AR 
parameters <f>i and <f> 2 . In accordance with standard time 
series theory, if the estimated process (as represented by 
the two coefficients) is to be stationary, then the coeffi- 
cients must obey the following restrictions [8]: 

4>i + <f>2 < 1 

4 > 2 — 0i < 1 
-l < <j>i < 1 

It will be assumed that the estimated coefficients are in 
fact stationary, thus providing bounds on the possible pa- 
rameter values (see Fig. 1). A uniform density is specified 
over all such allowable values of <f> 1 and <f>2- Of course, 
this does not allow for the fact that in practice (and in 
particular for fault conditions) there is no guarantee that 
the estimated coefficients will obey these bounds. The fol- 
lowing approach is adopted: if the estimated coefficients 
lie outside the bounds of the stationary region, then the 
probability of the normal state p(ui\0) is set to zero. 

The third parameter, the standard deviation of the volt- 
age from the Hall effect sensor, which measures motor cur- 
rent, is about 20 mV under normal conditions. Based on 
experience from observing the motor current signal under 
a variety of conditions, it is estimated that under any fault 
condition the standard deviation should not exceed 1 V. 
Hence, in the absence of any other prior information, a 
uniform density is placed on the standard deviation over 
this range 0 to 1 V for a. This density is assumed inde- 
pendent of the AR(2) coefficient density. This completes 
the specification of the likelihood model for 

For the other m— 1 states, normal and any known fault 
conditions, likelihood models can be found via the use of 
Gaussian assumptions with maximum likelihood parame- 
ter estimation or nonparametric density estimation. 

VI. Experimental Results 

In [2] the acquisition of data at DSS 13 was described. 
Specifically, sensor data were measured under controlled 
conditions from the elevation axis servomechanism of the 
34-m BWG antenna. Data are available for two different 
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days of antenna operation, referred to as day 42 and day 
53. Data were recorded for about 30 min with four differ- 
ent fault conditions present. The faults are: tachometer 
noise, tachometer failure, compensation loss in the ampli- 
fier, and encoder failure. The fourth fault, encoder failure, 
was a real fault that was subsequently repaired. It shows 
up in the data as being intermittent in nature. The other 
three faults were purposely introduced into the hardware 
in a controlled manner. 

The same model for the prior likelihood p(0\u m ) as de- 
scribed in Section V was used in all experiments. The 
Markov transition matrix was set to have probability of 
0.99998 of remaining in the normal state, which corre- 
sponds to a mean time between failure of about 2 days. 
The probability of transiting to any particular fault state 
was set uniformly, and the probability of remaining in a 
fault state was set to 0.95 (corresponding to a mean failure 
duration of 1 min before shutdown occurred). 

A model was trained on normal data and on one of the 
known faults (the compensation loss fault), giving a three- 
state model (normal, known fault, and unknown fault). 
The normal and known fault likelihood models were con- 
structed using a multivariate Gaussian density where the 
mean and covariance parameters were estimated from the 
data using maximum likelihood estimators. 

Two models were constructed in this manner (one on 
each of the day 42 and day 53 data sets) and then tested on 
the independent data from the other day. The goal of the 
experiment was to see if the model could correctly identify 
data as being either normal, a known fault, or an unknown 
fault. The ability to classify data into the third unknown 
category was of particular relevance, since, as described 
earlier in this article, previously developed models did not 
have this capability, i.e., all data were classified into one 
of the known states. 

The state sequence in each test data set was as follows: 
normal, unknown fault (tachometer failure); known fault 
(compensation loss); normal conditions, unknown fault 
(tachometer noise); and finally an unknown intermittent 
fault (encoder failure). Each state lasted roughly 5 min 
in duration. Figure 2 shows how one of the AR(2) coef- 
ficients changed as a function of the underlying state (for 
day 53). Note how noisy the estimates are, due in part to 
the fact that an AR(2) model is too simple to capture the 
full dynamics of the data. 

Figures 3(a) and 3(b) show the state estimation results 
in terms of estimated state probabilities [as in Eq. (2)] for 
each of the three states in the model. The results clearly 
indicate that the likelihood model has the ability to infer 


the correct state of the system from the observable data. 
As in [3], the Markov model adds stability to the estimates, 
reducing false alarms while still allowing a rapid transition 
when the underlying state changes. 

The important aspect of this new model is its ability to 
identify data as being of the unknown category, namely, 
between minutes 5 and 10, minutes 20 and 25, and the 
intermittent fault that occurred between minutes 25 and 
30. The response of the model is not entirely perfect. For 
example, during the test of day 42 data [Fig. 3(a)], in the 
first 5 min of normal operations, there appear to be at 
least two short false alarms, i.e., where the probability of 
normal conditions drops significantly below 1 even though 
the system is supposed to be in the normal state. This 
can be attributed to one of two possible causes: either 
the model is not quite accurate, or, more interestingly, al- 
though the system is assumed to be normal it is in fact in 
some other transient state. Closer examination of the orig- 
inal sensor data revealed that the second explanation was 
more likely to be true: the model detected the possibility 
of an unknown transient state that had not been noticed 
when these data were originally recorded. While this is a 
relatively simple example, it nonetheless demonstrates the 
basic concept of a model which can detect subtle changes 
and abnormalities in the behavior of a dynamic system — 
changes that are not noticeable to the human observer. 

It is also worth noting that the present model assigns 
a relatively low probability to short a priori states. An 
obvious extension to the model proposed here would be to 
further refine the unknown state into substates based on 
their temporal characteristics, i.e., intermittent or tran- 
sient, or permanent. 

VII. Discussion 

This article has described the basic principles behind 
the construction of dynamic system monitoring models 
that can classify system states into an “unknown” cate- 
gory. Although the basic idea is quite simple, it has some 
very useful properties. In addition, it is worth noting that 
all previous fault monitoring methods described in the lit- 
erature (of which the authors of this article are aware) im- 
plicitly assume that all system states of interest are known 
in advance. For large-scale complex systems, this is clearly 
an undesirable and unrealistic assumption. 

A possible criticism of the proposed method is the pos- 
sibly arbitrary nature by which the prior density for the 
unknown state is assigned. Certainly it must be admitted 
that this can never be a purely objective choice and re- 
quires the careful judgment of the modeller. However, any 
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model is by nature the result of various implicit biases and 
subjective judgments, and, hence the standard argument 
of the Bayesian school of statistical modelling can be ap- 
pealed to: if any reasonable prior information exists, then 
it is judicious to include it in the model. The astute reader 
will have noted that by simply changing the boundaries 
or constraints on the parameters of interest, the modeller 
can in effect control the detection to false alarm trade-off 
characteristic of the model [also known as the receiver op- 
erating characteristic (ROC) in signal detection theory]. 
The use of decision theoretic methods to minimize the rel- 
evant loss function (in the context of choosing the prior 
density) would seem the appropriate avenue by which to 
control this aspect of the model. 

The ability to detect new system states does not come 
without a cost. As alluded to earlier, the mapping describ- 
ing how the observed data depend on the system states 
(the likelihood) is generally more difficult to estimate than 
the mapping describing how the states depend on the ob- 
served data. Hence, for example, in the case where one has 
three known faults, a model of the type which is proposed 
here may not be as accurate in terms of discriminating 
among these faults as the types of discrimination mod- 
els that focus exclusively on these faults but which ignore 
the possibility of an unknown fault. One way to avoid 
this problem is to improve the quality of the likelihood 
modelling process. For example, a Gaussian assumption 
is often not appropriate: nonparametric density estima- 
tion methods, if used correctly, may provide more accurate 
models for the known states. 

Another possibility would be to use both likelihood 
models and discriminative models as part of one overall 
model. Letting p(w m ) be the posterior probability that 
the data are from an unknown state (as calculated by a 
likelihood model of the type described in this article), and 
letting the symbol w{i t denote the event that the 
true system state is one of the known states, one can es- 
timate the true posterior probability of individual known 
states as 

x [1 -p(w m )], 1 < t < m - 1 


where pd(u',|$ ) W{ 1> is the posterior probability es- 

timate of the known states as provided by a discrimina- 
tive model such as described in [2] and [3]. Note that 
this method does not in any way help to improve the abil- 
ity of the overall model to detect unknown states since 
that estimate remains unchanged; however, in principle, it 
should improve the ability of the model to distinguish be- 
tween specific known states. The possibility of improving 
the model described in this article by using this particular 
technique has not been tested in an experimental manner 
at this point. 

A final comment is that the ability of the likelihood 
model to detect unknown states is necessarily limited by 
the information in the observable data. For example, al- 
though the simple AR models reported here have given 
very useful information in terms of discriminating between 
normal and various fault states, it is quite possible that a 
fault state may not be well modelled by a simple linear 
AR model, i.e., that the AR coefficients will not yield any 
useful information. Hence, in general, the use of more ro- 
bust signal characteristics should improve the model per- 
formance. 


VIII. Summary 

A new method was proposed that allows the construc- 
tion of HMM monitoring algorithms without the require- 
ment that training data for each of a prescribed set of 
faults be made available. Naturally, if such data (or equiv- 
alent prior knowledge) are available, then these data can 
also be incorporated into the new model. The proposed 
method was validated on data from the DSS-13 BWG- 
antenna pointing system. In particular, the model was 
able to detect system states that could not have been de- 
tected using previously reported methods. While there is 
still room for improvement in terms of the performance of 
this class of models, the results are nonetheless quite accu- 
rate and of significant practical importance in the context 
of monitoring 70-m antenna data where fault training data 
are unlikely to be available. 
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ESTIMATED AR(2) COEFFICIENT 



Fig. 1. Admissible region for AR(2) parameters. 


NORMAL UNKNOWN KNOWN NORMAL UNKNOWN UNKNOWN 

(INTERMITTENT) 



Fig. 2. Estimates of coefficient (j > 2 as a function of the system state (day 53 data). 
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NORMAL UNKNOWN KNOWN NORMAL UNKNOWN UNKNOWN 

(INTERMITTENT) 




Fig. 3. Estimates of posterior state probabilities as provided by the likelihood and hidden 
Markov models: (a) training on day 53 and testing on day 42 data and (b) vice versa. 
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Appendix 

Hidden Markov Model Description 


Let fi denote the discrete- valued state variable tak- 
ing values in the set {u>i, . . . , u; m }. A first-order discrete- 
time Markov model is characterized by the assumption 
that 

Pimm* -i).-- ..o(o)] = Pimm - m (a-u 

That is, that the conditional probability of any current 
state given knowledge of all previous states is the same 
as the conditional probability of the current state given 
knowledge of the system state at time t — 1. This is 
equivalent to the well-known assumption of “memoryless- 
ness” in that the evolution of the system depends only 
on the present state and not on the past state. A direct 
consequence is the fact that the number of consecutive 
time steps that the system spends in any given state will 
be a discrete random variable with a geometric distribu- 
tion. 

In a standard, nonhidden Markov model, to calculate 
the probability that the system is in a given state at time 
one needs only to know the initial state probabilities tt = 
p[^i(0), - . . , w m (0)] and the values a { j = p[w,(<)|wj(< - 1)], 
1 < *, 3 < m - The m x m matrix A is known as the tran- 
sition matrix and characterizes the Markov model. The 
first-order Markov assumption governing state evolution 
in time may appear restrictive at first glance, but has 
been found in practice to be an extremely robust model 
for many real-world applications. In principle, the theory 
for higher-order Markov models can be developed, but at 
the cost of increased complexity in terms of specifying the 
model and of increased computational complexity in terms 
of on-line calculation of the posterior probabilities. 

Under the first-order Markov assumption, it can easily 
be shown that the probability in which the system remains 
in the normal state from one instant to the next can be 
expressed as 


in the transition matrix A can be estimated from informa- 
tion concerning the general nature of system faults, which 
may be available from an existing database or can be esti- 
mated based on known physical properties of the system. 
Augmented models may have a wide variety of additional 
states. For example, it may be useful to include a state 
to account for the transient behavior of the system. Sim- 
ilarly, states which account for known operational modes 
of the system, such as powered off and brakes on, may 
be necessary in practice. The specification of the Markov 
transition matrix corresponds to the explicit modelling of 
high-level prior knowledge concerning system behavior at 
the state level. In particular, it does not involve the speci- 
fication of prior models for observable data over time since 
typically this is much more difficult to model. This is 
precisely the advantage of the HMM decomposition: the 
temporal behavior of the system needs only to be specified 
at a relatively high level. 

Denote the observed data up to time t to be = 
{£(<), * * ■ >£(G)}* The hidden aspect of the Markov model 
is derived from the fact that the observed data is a 
stochastic function of the underlying Markov states. These 
states are hidden in the sense that they cannot be mea- 
sured directly. It is the state identities which one wishes to 
estimate, hence, the purpose of the modelling is to repre- 
sent the relationship between the states and the observable 
data such that the most likely state sequence can be in- 
ferred. Figure A-l shows an illustration of the concept for 
a three-state HMM. For on-line monitoring of a dynamic 
system, the observed data simply consist of observed sen- 
sor data (or derived parameters) while the states reflect the 
underlying system states, in particular normal and fault 
operational states. 

An estimate of the instantaneous likelihood, the prob- 
ability of the observed data at time t conditioned on the 
state variable, p[0(*)|Q(<)], is assumed to be known. The 
goal is to take advantage of all the symptom information 
and to estimate p[ft(<)|<£*]. It is convenient to work in 
terms of an intermediate variable a, where 


a u - 1 


T 

MTBF 


(A-2) 


«»(0 = p[w, ■(<).$«] (A-3) 


where the MTBF is the mean time between failures of 
the system and r is the time between states (both ex- 
pressed in the same units). Similarly, the other elements 


To find the posterior probabilities of interest, it is suf- 
ficient to be able to calculate the a’s at any time since 
by Bayes’ rule 
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PMO 1*0 = -7^T^(0 = - 7 ?®- ( A - 4 ) 

p( *‘ } E »i(0 


The derivation of a recursive estimate follows: 


«,(<) = f>(W)> - i)) = £p(w, •(<),£( - !)) 

J = 1 V 7 J=1 V 7 

= - !))<*>(* - L ) 

;=i V J 


by the definition of aj 


= ^p^(0k(<).^-i.Wj(< - 1)) p (w, (i) l$<-i > Uj (< - 1 ))»,(* - 1) 

= £p(fi(0k(0)p("<(0l*i-i."i(< - - 1) 

assuming that 0ft) is independent of past observations and past states, given the present state 

= Xj p (^(0 Mo) p ( w i (0 I w j (* - i)) a >(* - !) 

assuming that u>,(f) is independent of past observations given the past states 

= p(W)M0) “ 1) 

V ' ;=i 


(A-5) 


The first term is the likelihood (assumed to be known). 
The terms in the sum are just a linear combination of the 
a’s from the previous time step. Hence, Eq. (A-5) pro- 
vides the basic recursive relationship for estimating state 
probabilities at any time t. 

The additional assumptions made in the derivation of 
Eq. (A-5) (besides the first-order Markov assumption on 
state dependence) require some comment. The first as- 
sumption is that 0ft) is independent of both the most 
recent state and the observed past data, given that the 
present state is known. This implies that the observed 
symptoms are assumed to be statistically independent 
from one time window to the next, given the state in- 
formation. This will generally be true when the values of 


0ft) consist of derived parameters and r is much greater 
than any significant time constants of the dynamic system. 
Even if it is known that the £’s exhibit temporal correla- 
tions, this can also in principle be modelled directly in 
Eq. (A-5), although the model will now be much more 
complex. The second assumption, that the present state 
only depends on the previous state but not the past ob- 
servations, simply reflects the causal relationship between 
symptoms and states. 

Note that the method described above only calculates 
the state probabilities based on past information. Alter- 
native estimation strategies are possible. For example, us- 
ing the well-known forward-backward recurrence relations 
[9], one can update the state probability estimates using 
symptom information which occurred later in time. 
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OBSERVABLE $(f-1) £{t) 0{t+ 1) 5(f + 2) 

HIDDEN 



Fig. A-1. An illustrative example of a three-state hidden Markov 

model. 
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